A. The Tool-Calling Loop

How LLMs go from words to actions

Agenda

  • A. The Loop — How tool calling works end-to-end
  • B. Schema Design — Crafting schemas LLMs can use reliably
  • C. The Integration — Wiring tools into the OpenAI API
  • D. Error Handling — Building resilient tool execution
  • E. Wrap-up — Key takeaways & lab preview

From Chat to Action

LLMs predict text — they cannot do anything on their own.

Function calling bridges this gap: the model outputs a structured JSON request, your code executes it, and the result feeds back into the conversation.

Key Insight

The LLM never runs code. It asks your system to run code by producing a structured tool call.

The Tool-Calling Cycle

sequenceDiagram
    participant U as User
    participant L as LLM
    participant S as Your System
    participant T as Tool Function

    U->>L: "What is 15% of 500?"
    L->>S: tool_call: execute_calculation(multiply, 500, 0.15)
    S->>T: execute_calculation(multiply, 500, 0.15)
    T-->>S: {"result": 75.0, "success": true}
    S->>L: tool result: 75.0
    L->>U: "15% of 500 is 75."

Terminology: Functions vs Tools

Function Calling

  • Original term (OpenAI, 2023)
  • Describes the API pattern
  • Developer defines function schemas
  • Model outputs structured JSON

Tool Calling

  • Broader, current term
  • OpenAI API uses tools parameter
  • Tool = the interface (JSON schema)
  • Function = the backend code

Both terms describe the same mechanism. The industry is converging on “tool calling”.

The Standard: JSON Schema

Every tool is described using JSON Schema — the common language models understand.

calculator_schema = {
    "type": "function",
    "function": {
        "name": "execute_calculation",
        "description": "Executes a basic arithmetic operation.",
        "parameters": {
            "type": "object",
            "properties": {
                "operation": {
                    "type": "string",
                    "enum": ["add", "subtract", "multiply", "divide", "pow"]
                },
                "operand_a": {"type": "number"},
                "operand_b": {"type": "number"}
            },
            "required": ["operation", "operand_a", "operand_b"]
        }
    }
}

B. Schema Design

Crafting schemas LLMs can use reliably

Description Engineering

The description field is a prompt to the LLM. It determines when the model calls your tool.

Bad description:

"description": "Does math"

Good description:

"description": "Executes arithmetic
operations. Use for percentages,
growth rates, compound interest,
or simple arithmetic. Example:
15% of 200  multiply(200, 0.15)"

The Rule

Complete the sentence: “Use this function to…” — if the description doesn’t guide the model, the tool won’t be called.

Enums vs Free Text

G A User: 'add 5 and 3' B Schema Type? A->B C Ambiguous Output 'plus', 'Add', 'addition' B->C String D Precise Output 'add' B->D Enum

The Rule

Use enum whenever there is a fixed set of valid values. This is the single biggest improvement to tool-calling accuracy.

Nested Objects & Required Fields

LLMs handle structured nested objects well — don’t flatten everything.

# Good: structured nested object
"parameters": {
    "type": "object",
    "properties": {
        "location": {
            "type": "object",
            "properties": {
                "city": {"type": "string"},
                "country": {"type": "string", "enum": ["SA", "US", "UK"]}
            },
            "required": ["city"]
        },
        "price_range": {
            "type": "string",
            "enum": ["budget", "mid", "luxury"]
        }
    },
    "required": ["location"]    # Be explicit!
}

Omitting required leads to partial calls that fail silently.

Schema Design Checklist

Principle What to Do
Verb names get_weather, execute_query, search_hotels
Rich descriptions Include purpose, examples, edge cases
Enums over strings Constrain values wherever possible
Explicit required List every mandatory parameter
Structured objects Use nesting for related parameters

C. The Integration

Wiring tools into the OpenAI API

The API Call

response = client.chat.completions.create(
    model="gpt-4o-mini",
    messages=messages,
    tools=get_tool_schemas(),    # Inject tool definitions
    tool_choice="auto",          # Let model decide
    temperature=0.1              # Low temp → deterministic
)
  • tools — list of JSON Schema tool definitions
  • tool_choice="auto" — model decides whether to call a tool
  • temperature=0.1 — lower = more reliable tool use

Parsing Tool Calls

When the model decides to use a tool, the response contains tool_calls:

response_message = response.choices[0].message

if response_message.tool_calls:
    for tool_call in response_message.tool_calls:
        name = tool_call.function.name          # "execute_calculation"
        args = json.loads(
            tool_call.function.arguments        # '{"operation":"multiply",...}'
        )
        tool_id = tool_call.id                  # "call_abc123"

        # Execute your function
        result = execute_tool(name, args)

Note

Always wrap json.loads() in try/except — models occasionally produce malformed JSON.

Feeding Results Back

The model needs to see the tool result to formulate its final answer.

# Append tool result to conversation
messages.append({
    "role": "tool",
    "tool_call_id": tool_call.id,       # Must match!
    "content": json.dumps(result)
})

# Second API call — model sees the result
final = client.chat.completions.create(
    model=model, messages=messages
)

The Complete Flow

G A User Message B API Call 1: messages + tools A->B C tool_calls? B->C D Return text response C->D No E Parse tool_calls C->E Yes F Execute each tool E->F G Append tool results to messages F->G H API Call 2: updated messages G->H I Return final response H->I

D. Error Handling

Building resilient tool execution

Why Tools Fail

Real-world tools call databases, APIs, and filesystems — all unreliable.

Common failures:

  • Network timeouts
  • Invalid arguments from LLM
  • Rate limits exceeded
  • External service downtime
  • Division by zero, type errors

Without handling:

  • Agent crashes mid-conversation
  • User sees raw stack traces
  • No way to recover or retry
  • Lost trust in the system

Structured Error Returns

The tool always returns a dict — never raises an uncaught exception.

def execute_calculation(operation, operand_a, operand_b):
    try:
        if operation == "divide" and operand_b == 0:
            return {"success": False,
                    "error": "Division by zero is not allowed.",
                    "result": None}
        # ... perform calculation ...
        return {"success": True, "result": result, "error": None}
    except Exception as e:
        return {"success": False,
                "error": f"Calculation error: {str(e)}",
                "result": None}

The LLM receives the error in the tool message and explains it to the user naturally.

The Resilient API Call Decorator

For external APIs, add retries with exponential backoff:

from tenacity import retry, stop_after_attempt, wait_exponential

def resilient_api_call(max_retries=2, timeout_seconds=10):
    def decorator(func):
        @retry(
            stop=stop_after_attempt(max_retries + 1),
            wait=wait_exponential(multiplier=1, min=2, max=10)
        )
        def wrapper(*args, **kwargs):
            try:
                return func(*args, **kwargs)
            except Timeout:
                return {"success": False,
                        "error": f"Timed out after {timeout_seconds}s"}
            except Exception as e:
                return {"success": False, "error": str(e)}
        return wrapper
    return decorator

Graceful Degradation & Observability

Graceful Degradation

  1. Tool returns structured error
  2. LLM explains the issue to user
  3. Agent can suggest alternatives
  4. Conversation continues

“I couldn’t fetch the stock price right now. Would you like me to try a different source?”

Observability

import logging
logger = logging.getLogger(__name__)

# Log every tool execution
logger.info(f"Executing {tool_name}")
logger.info(f"Args: {arguments}")
logger.info(f"Result: {result}")
logger.error(f"Failed: {error}")

Log what was called, with what args, and what happened.

Error Handling Checklist

Layer Strategy
Schema Enums + required fields prevent bad inputs
Parsing try/except around json.loads()
Execution Structured {success, result, error} returns
External APIs Retries + exponential backoff + timeouts
Logging Every call logged for debugging
User-facing LLM explains errors in natural language

E. Wrap-up

Key Takeaways

  1. Tool calling lets LLMs request actions — they never execute code themselves.
  2. JSON Schema is the standard for defining tool interfaces.
  3. Description engineering determines whether the model calls your tool correctly.
  4. Enums and required fields dramatically improve tool-calling accuracy.
  5. Structured error returns keep the agent resilient — never let tools crash.
  6. The two-call pattern: send tools → parse calls → execute → feed results back → get final answer.

Up Next

Lab 1 — Schema Gym: Design and validate schemas hands-on (no API keys needed).

Lab 2 — Calculator Tool: Build a complete tool-calling agent with OpenAI integration.